policy iteration meaning in Chinese

策略迭代法

The aco algorithms are fitted into the framework of generalized policy iteration ( gpi ) in rl based on incomplete information of the markov state . furthermore , we show that the pheromone update in the acs and ant - q algorithm is based on the mc methods or some formalistic combination of mc methods and td methods
此外在强化学习的理论框架内说明了as算法是一种基于蒙特卡洛方法的强化学习算法， acs和ant - q算法是一种蒙特卡洛方法与瞬时差分方法在形式上相结合的强化学习算法。